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Abstract 

We consider the problem of modulation and estimation of a random parameter U to be 
conveyed across a discrete memoryless channel. Upper and lower bounds are derived for the best 
achievable exponential decay rate of a general moment of the estimation error, E\U— U\'', p > 0, 
when both the modulator and the estimator are subjected to optimization. These exponential 
error bounds turn out to be intimately related to error exponents of channel coding and to 
channel capacity. While in general, there is some gap between the upper and the lower bound, 
they asymptotically coincide both for very small and for very large values of the moment power 
p. This means that our achievability scheme, which is based on simple quantization of U followed 
by channel coding, is nearly optimum in both limits. Some additional properties of the bounds 
are discussed and demonstrated, and finally, an extension to the case of a multidimensional 
parameter vector is outlined, with the principal conclusion that our upper and lower bound 
asymptotically coincide also for a high dimensionality. 

Index Terms: Parameter estimation, modulation, discrete memoryless channels, error expo- 
nents, random coding, data processing theorem. 
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1 Introduction 

Consider the problem of conveying the value of a parameter u across a given discrete memoryless 



where x = (xi, . . . , Xn) and y = (yi, . . . , yn) are the channel input and output vectors, respectively. 
Our main interest, in this work, is in the following questions: How well can one estimate u based 
on y when one is allowed to optimize, not only the estimator, but also the modulator, that is, the 
function x{u) = {xi{u), . . . , Xn{u)) that maps u into a channel input vector? How fast does the 
estimation error decay as a function of n when the best modulator and estimator are used? 

In principle, this problem, which is the discrete-time analogue of the classical problem of 
"waveform communication" (in the terminology of \15\ Chap. 8]), can be viewed both from the 
information-theoretic and the estimation-theoretic perspectives. Classical results in neither of 
these disciplines, however, seem to suggest satisfactory answers. 

From the information-theoretic point of view, if the parameter is random, call it U, this is 
actually a problem of joint source-channel coding, where the source emits a single variable U (or a 
fixed number of them when [/ is a vector), whereas the channel is allowed to be used many times (n 
is large) . The separation theorem of classical information theory asserts that asymptotic optimality 
of separate source- and channel coding is guaranteed in the limit of long blocks. However, it refers 
to a regime of long blocks both in source coding and channel coding, whereas here the source block 
length is 1, and so, there is no hope to compress the source with performance that comes close to 
the rate-distortion function. 

In the realm of estimation theory, on the other hand, there is a rich literature on Bayesian and 
non-Bayesian bounds, mostly concerning the mean square error (MSE) in estimating parameters 
from signals corrupted by an additive white Gaussian noise (AWGN) channel, as well as other 
channels (see, e.g., [12] and the introductions of [1], [2], and [H] for overviews on these bounds). 
Most of these bounds lend themselves to calculation for a given modulator x{u) and therefore 
they may give insights concerning optimum estimation for this specific modulator. They may not, 
however, be easy to use for the derivation of universal lower bounds, namely, lower bounds that 
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depend neither on the modulator nor on the estimator, which are relevant when both optimum 
modulators and optimum estimators are sought. Two exceptions to this rule (although usually, 
not presented as such) are families of bounds that stem from generalized data processing theorems 
(DPT's) [5], [6], [11], [16], |18j . henceforth referred to as "DPT bounds", and bounds based on 
hypothesis testing and channel coding considerations |[3j, [T7], henceforth called "channel- 
coding bounds." 

In this paper, we use both the channel-coding techniques and DPT techniques in order to derive 
lower bounds on general moments of the estimation error, E\U — where C/ is a random param- 
eter, U is its estimate, and the power p is an arbitrary positive real (not necessarily an integer). It 
turns out that when x{u) is subjected to optimization, E\U — U\'^ can decay exponentially rapidly 
as a function of n, and so, our focus is on the best achievable exponential rate of decay as a function 
of p, which we shall denote by S{p), that is, 

inf E\U - U\P ^ e-''^^p\ (2) 

where the infimum is over all modulators and estimatorslll Interestingly, both the upper and 
lower bounds on £{p) are intimately related to well-known exponential error bounds associated 
with channel coding, such as Gallager's random coding exponent (for small values of p) and the 
expurgated exponent function (for large values of p). In other words, we establish an estimation- 
theoretic meaning to these error exponent functions. In particular, under certain conditions, our 
channel-coding upper bound on £{p) (corresponding to a lower bound on E\U — U\P) can be 
presented as 

where Eq{p) = maxg Eq{p, q), Eq{p, q) being Gallager's function, Eex{0) is the expurgated exponent 
at zero rate, and po is value of p for which Eq{p) = Eex{0) (so that E{p) is continuous). In addition, 
we derive a DPT bound and discuss its advantages and disadvantages compared to the above bound. 

We also suggest a lower bound, E_{p), on S{p) (associated with upper bounds on inf £^1^7 — U\p), 
which is achieved by a simple, separation-based modulation and estimation scheme. While there 
is a certain gap between E{p) and E_{p) for every finite p, it turns out that this gap disappears (in 
the sense that the ratio E_{p) / E{p) tends to unity) both for large p and for small p, and so, we have 



^This is still an informal and non-rigorous description. More precise definitions will be given in the sequel. 
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exact asymptotics of £{p) in these two extremes: For large p, £{p) tends to i?ex(0) and for small 
/9, £{p) ~ pC, where C is the channel capacity. Our simple achievability scheme is then nearly 
optimum at both extremes, which means that a separation theorem essentially holds for very small 
and for very large values of p, in spite of the earlier discussion (see also ^ Section III.D]). The 
results are demonstrated for the example of a "very noisy channel," [H Example 3, pp. 147-149], 
[131 pp. 155-158], which is convenient to analyze, as it admits closed-form expressions. 

Finally, we suggest an extension of our results to the case of a multidimensional parameter 
vector U = (?7i, . . . , Ud)- It turns out that the effect of the dimension d is in reducing the effective 
value of p by a factor of d. In other words, E{p) is replaced by E{p/d) and the extension of the 
achievability result is straightforward. This means that for fixed p, the limit of large d (where the 
effective value p/d is very small) also admits exact asymptotics, where £{p) ~ pC/d. 

The outline of the paper is as follows. In Section 2, we define the problem formally and we 
establish notation conventions. In Section 3, we derive our main upper and lower bounds based on 
channel coding considerations. In Section 4, we derive our DPT bound and discuss it. Section 5 
is devoted to the example of the very noisy channel, and finally, in Section 6 the multidimensional 
case is considered. 

2 Notation Conventions and Problem Formulation 

Throughout this paper, scalar random variables (RV's) will be denoted by capital letters, their 
sample values will be denoted by the respective lower case letters, and their alphabets will be 
denoted by the respective calligraphic letters. A similar convention will apply to random vectors and 
their sample values which will be denoted with same symbols in a bold face font. For example, y £ y 
is a realization of a random variable Y, whereas y = (yi, . . . ,yn) € y^'' {n being a positive integer 
and 3^" being the n-th Cartesian power of 3^) is a realization of a random vector Y = (Yi, . . . , Yn). 

Let [/ be a uniformly distributee]^ random variable over the interval [—1/2, +1/2], which we 

will also denote by U. We refer to U as the parameter to be conveyed from the source to the 

destination, via a given noisy channel. A given realization of U will be denoted by u. 

^This specific assumption concerning the density of U and its support is made for convenience only. Our results 
extend to more general densities. 
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A discrete memoryless channel (DMC) is characterized by a matrix of conditional probabilities 
P — X ^ X, y G y}, where the channel input and output alphabets, X and y, are assumed 

finiteO When a DMC p = {p{y\x), x G X, y £ y} is fed by an input vector x G X^, it produces 
an output vector y G y"- according to 



piy\x) = Y{p{yt\xt). 



(4) 



t=i 



A modulator is a measurable mapping x = fn{u) from U = [—1/2, +1/2] to X^ and an estimator 
is a mapping u = gn{y) from y^ back to U. The random vector fn{U) will also be denoted by 
X. Similarly, the random variable gn{y) will also be denoted by U . Our basic figure of merit for 
communication systems is the expectation of p-ih. power of the estimation error, i.e., E{\U — U\p}, 
where p is a positive real (not necessarily an integer) and is the expectation operator with 

respect to (w.r.t.) the randomness of U and Y . The capability of attaining an exponential decay 
in E{\U — C^l''} by certain choices of a modulator /„ and an estimator gn, motivates the definition 
of the following exponential rates 



£{p) = limsup 



and 



lim inf 

n— >oo 



-\n [ inf E{\U -U\P} 

n \fn,9 



^ In ( inf E{\U-U\^} 

n \fn,g. 



(5) 



(6) 



This paper is basically about the derivation of upper bounds on £{p) and lower bounds on £_{p), 
with special interest in situations where these upper and lower bounds come close to each other. 

3 Upper and Lower Bounds Based on Channel Coding 



Let q = {q{x), x G X} be a given probability vector of a random variable X taking on values in 
X, and let p = {p{y\x), X, y £ y} define the given DMC. Let EQ{p,q) be the Gallager function 
d p. 138, eq. (5.6.14)], [HI p. 133, eq. (3.1.18)], defined as 



Eoip,q) = -ln\Y^ 



g(x)p(y|x)i/(i+^) 



, p>0. 



(7) 



^The finite alphabet assumption is used mainly for reasons of simplicity. The extension to continuous alphabets 
is possible, though some caution should be exercised at several places. 
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Next, we define 



Eo{p) = maxEoip,q), 



(8) 



where the maximum is over the entire simplex of probabihty vectors, and let Eq{p) be the upper 
concave envelope! (UCE) of Eo{p). Next define 



Ex{q) = -^iln ( ^ qix)q{x') 



J2 Vp{y\x)p{y\x') 



(9) 



where the parameter g should be distinguished from the power p of the estimation error in discus- 
sion. The expurgated exponent function [4, p. 153, eq. (5.7.11)], [131 p. 146, eq. (3.3.13)] is defined 
as 

Eex{R) = sup[E^{g) - gR]. (10) 
It is well known (and a straightforward exercise to show) that 



Eex{^) = snY>Ex{g) = lim Ex{g) = - q{x)q{x')\D. 



[ID 



Finally, define 

where po is the (unique) solution to the equation Eq{p) = Eex{0)- 

Our first theorem (see Appendix A for the proof) asserts that E{p) is an upper bound on the 
best achievable exponential decay rate of p-th moment of the estimation error. 

Theorem 1 Let U be uniformly distributed overlA = [—1/2, +1/2] and let p = {p{y\x) x € JM, y € 
y} be a given DMC. Then, for every p > 



S{p) < E{p). 



(13) 



We now proceed to present a lower bound E_{p) to £,{p)- L^t R- be the smallest R such that 
Ef>x{R) is attained with = 1 and let i?+ denote the largest R such that 



EJR) = max \En{g, 

0<p<l 



qR] 



(14) 



^While the Gallager function Eo{p, q) is known to be concave in p for every fixed q [131 p. 134, eq. (3.2.5a)], we are 
not aware of an argument asserting that Eo(p) is concave in general. On the other hand, there are many situations 
where -Eo(p) is, in fact, concave and then Eo{p) = Eo{p), for example, when the achiever q* of maxq Eo{p,q) is 
independent of p, like the case of the binary input output-symmetric (BIOS) channel [131 p. 153]. 
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is attained for = ijj Next, define 



^o(l) - R+ 

R+ 

Eo{l)-R- 



(15) 



(16) 



P- 



and finally, 



supo<5<i pEoig)/ (g + p) P< P+ 

E{p) = I pE^{l)/{l + p) = pE^{l)/{l +p) p+<p<p 

. ^^Vq>iPEx{q)/{q + p) P>P- 



(17) 



Our next theorem (see Appendix B for the proof) tells us that E_{p) is a lower bound on the best 



Theorem 2 Let U be uniformly distributed overU = [—1/2, +1/2] and let p = {p{y\x) x £ X, y £ 
y} be a given DMC. Then, for every p > 



The derivations of both E{p) and E_{p) rely on channel coding considerations. In particular, 
the derivation of E{p) builds strongly on the method of [7J, which extends the derivation of the 
Ziv-Zakai bound [17J and the Chazan-Zakai-Ziv bound [3j. While the two latter bounds are based 
on considerations associated with binary hypotheses testing, here and in [7J, the general idea is 
extended to exponentially many hypotheses pertaining to channel decoding. 

We see that both bounds exhibit different types of behavior in different ranges of p (i.e., "phase 
transitions"), but in a different manner. For both E(p) and E(p) the behavior is related to the 
ordinary Gallager function in some range of small p, and to the expurgated exponent in a certain 
range of large p. 

As can be seen in the proof of Theorem 2 (Appendix B), the communication system that achieves 
E_{p) works as follows (see also [7j, [8]): Define 



''For example, in the case of the BSC with a crossover parameter p, R- = In 2—h2{Z/ (l+Z)), with Z = ^/4p{^^^p), 
and R+ = ln2 - h2{^/{^ + y/T^)), where h2ix) = -xlnx - (1 - x)ln(l - a;) ll3j pp. 151-152]. 



attainable exponential decay rate of E{\U — U\^}. 



ap)>E{p). 



(18) 




(19) 
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Construct a uniform grid of M = e^^^^^ /2 evenly spaced points along U, denoted {ui,U2, ■ ■ ■ ,um}- 
li p > P- assign to each grid point Ui a codeword of a code of rate R[p) that achieves the expurgated 
exponent Eex[R{p)] (see [H Theorem 5.7.1] or \1?>\ Theorem 3.3.1]). If p < p-, do the same with a 
code that achieves Er[R{p)\ (see [H p. 139, Corollary 1] or [131 Theorem 3.2.1]). Given u, let fn{u) 
be the codeword Xi that is assigned to the grid point Ui, which is closest to u. Given y, let gn{y) 
be the grid point Uj that corresponds to the codeword Xj that has been decoded based on y using 
the ML decoder for the given DMC. 

Let us examine the behavior of these bounds as p — )■ and as p — t- oo. For very large values of 
p), where the upper bound E{p) is obviously given by £'ea;(0), the lower bound is given by 

lim E{p) = hm sup ^^^M (20) 

p-S>oo p^oo Q + p 

> hm^^^ (21) 

p-s>oo ^ + p 

= \\m E,{^) = EeM, (22) 

p— >oo 

which means that for large p all the exponents asymptotically coincide: 

lim E{p) = lim £{p) = lim I{p) = lim E{p) = EexiO). (23) 

p— >oo p->oo p— >oo p— >oo 

In the achievability scheme described above, R{p) is a very low coding rate. On the other hand, 
for very small values of p, where E{p) = Eo{p) = pC + o(p), C being the channel capacity, we have 

hm ^ = lim sup ^ (24) 
P^o p P^Oo<e<i Q + P 

> hm^^^ (25) 

^ + p 

= hm^.^ (26) 
P^o ^ 1 + ^ 

= lin,^^ = C, (27) 
p^O ^ 

which means that for small p all the exponents behave like pC, i.e., 

„„Si = „„M = „„,M = i,^M = c. (28) 

p-5>0 p p->0 p p-!>0 p p^O p 

It is then interesting to observe that not only channel-coding error exponents, but also channel 
capacity plays a role in the characterization of the best achievable modulation-estimation perfor- 
mance. In the achievability scheme described above, R{p) is a very high coding rate, very close to 
the capacity C. 
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4 Upper Bound Based on Data Processing Inequalities 



We next derive an alternative upper bound on £{p) that is based on generalized data processing 
inequalities, following Ziv and Zakai [18j and Zakai and Ziv [16j. The idea behind these works 
is that it is possible to define generalized mutual information functionals satisfying a DPT, by 
replacing the negative logarithm function of the ordinary mutual information, by a general convex 
function. This enables to obtain tighter distortion bounds for communication systems with short 
block length. 

In [6] it was shown that the following generalized mutual information functional, between two 
generic random variables, A and admits a DPT for every positive integer k and for every vector 
(qi, . . . , Qfc) whose components are non-negative and sum to unity: 

{k \ k 

Y^tlpmi)'-' = -EH E ^{^^)p{bwr■ m 

In particular, since C/ — )• 1^ — )• C/ is a Markov chain, then by the generalized DPT, 

i{U;U)<i{U;Y). (30) 

The idea is to further upper bound I{U]Y) and to further lower bound I{U;U) subject to the 
constraint E\U — U\^ = D, which leads to a generalized rate-distortion function, and thereby to 
obtain an inequality on E\U — U\f. Specifically, I{U ; 1^) is upper bounded as follows: 

f+l/2 

IiU;Y) = -EIT/ du,p{y\fniui)r (31) 



i=i 



k ^+1/2 " 



du,llp{yt\[fn{u^)]tr 
,tj/.-.j=i "1/2 t=l 



(32) 



n k ,.4^1/2 



riEIl/ duMytlifnMW' (33) 



t=lyeyi=l 



'1/2 
n k 

< -mmY[Y.ll^q{xi)p{yt\xir' (34) 



t=l yey i=l Xi&X 



mm 



k 

En E i(^iMy\^ 

yey 1=1 XiGX 



(35) 



exp{— nmaxi?(ai, . . . , afc, g)}, (36) 
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where denotes the t-th component of the vector x = fn{ui) and where 

k 



(37) 



E{--^,...,-^,q\ =Eo{Q,q). (38) 



E{ai, ... ,ak,q) = - In 

Note that ior k = 1 + g {g - integer), 

T+g'"'l + g 
In Appendix C we show that 

mm{I{U,U): E\U - U\P = D} = R{D) > -c ■ 0^'^=^'^"'^"''^ (39) 

where c is a constant that depends solely on p, k and qi, . . . , a^, and where 

[ p l+p — — I K J 

The function R{D) in eq. (j39p is referred to as a "generalized rate-distortion function" in the 
terminology of [18j and [16]. Thus, from the generalized DPT, 

E\JJ -U\P = D>c ■e-'^'^'^P'^^p'^ (41) 

where c' is another constant and 

^.P.(p)^mf inf sup%^-^. (42) 

fc>i , EtiCp(«i) 

As an example, assume that the channel is such that the function EQ{g) is concave, so that Eo{g) = 
EQ{g). In this case, po > 1 since £"0(1) ^ -E'ex(O) and EQ^g) is monotonically increasing. Now, let 
P ^ Po be an integer (for example, p = 1 is always a legitimate choice). Then, 

E{p) = Eoip) (43) 

/ (i + p)C^(i/(i + p)) ^ ^ 

..... g(ai,...,aA;,g) , , 

> mi mi sup r (45 

= Edpt{p). (46) 

Thus, at least in this case, the DPT bound is guaranteed to be no worse than the channel-coding 
bound E{p). Nonetheless, in our numerical studies, we have not found an example where the 
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DPT bound strictly improves on the channel-coding bound, i.e., Eopt{p) < E{p), and it remains 
an open question whether the DPT bound can offer improvement in any situation, thanks to its 
additional degrees of freedom. It should be pointed out that the vector (ai, . . . , Uk) that achieves 
Edpt{p) is not always given by {l/{k + 1), . . . , l/{k + 1)) because the function E[ai. . . . ,ak,q) 
is not convex in (ai,...,afc). At any rate, in all cases where the two bounds are equivalent, 
namely, Edpt{p) = E{p), this is interesting on its own right since the two bounds are obtained 
by two different techniques that are based on completely different considerations. One advantage 
of the DPT approach is that it seems to lend itself more comfortably to extensions that account 
for moments of more general functions of the estimation error, i.e., E{g(\U — U\)}, for a large 
class of monotonically increasing functions g. On the other hand, the optimization associated with 
calculation of the DPT bound is not trivial. 

5 Example: Very Noisy Channel 

As an example, we consider the so called very noisy channel, which is characterized by 

p{y\x) =piy)[l + eix,y)], |e(x,y)|<l, \fxeX,yey. (47) 
As is shown in [13', Sect. pp. 155-158], to the first order, we have the following relations 

C = l-maxy^q{x)p{y)e^{x,y) (48) 

x,y 

Eo{q) = -^-C, (49) 

1 + Q 



and therefore 

E„(R] = , 

o<(?<i yi + 

As for the expurgated exponent, we have 



EriR) = max ( ■ C - qR 



^ ^ — R R K ^ 

{VC - y/Rf ^<R<C (50) 
R>C 



2 

and so, 



eaq) = m) = ? (51) 



Eex{R) = snp[E,{g) - gR] = ^ - R (52) 
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which means that expurgation does not help for very noisy channels. This implies that po = 1 and 
so 

As for the lower bound, we have the following: For p < 1, 



E{p) = sup 



■C 



P 



■ c. 



(54) 



Q<Q<iP+Q ! + (1 + ^)2 

The same result is obtained, of course, from the solution to the equation pR = {^/C — VR)"^. For 

p> 1, 

£(p)=s.p^^ = sup^.f = -^4. (55) 

Q>1 Q + P Q>1 g + P 2 1 + p 2 

Thus, in summary 

f . P^.. • C o< 1 

(56) 



E{P) 



We see how the bounds asymptotically coincide (in the sense that E{p)/E_{p) 1) both for very 
large values of p and for very small values of p (see Fig. [l]) . 




Figure 1: The upper bound E{p)/C (solid curve) and the lower bound E_{p)/C (dashed curve) for 
the example of the very noisy channel. 
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As for the DPT bound, we have the following approximate analysis: 

k 



y&y '=1 



^ q{xi)p{y\xi 



inf 

<? 



Eri( 

yeyi=i I 



inf J] 



q{xi)[l + e{x,,y)r 



yay i=l \xi&X 

k 

inf 

y&y i=l 
inf Vp(y) 

n — ^ 



1 + aie{xi, y) - -jOii{l - ai)e^{xi, y) 



XidX 

1 

1=1 XiSA' 

1 - C^ai(l - a,) 



(57) 
(58) 
(59) 
(60) 
(61) 
(62) 
(63) 
(64) 
(65) 



where in the fifth line, we have used the identity q{x)e{x, y) = for all y with p(y) > [13j P- 
156, eq. (3.4.28)]. Thus, 



sup £;(«!, . . . ,afc,q') = - In 
q 



1=1 



a,- 



(66) 



1=1 



and then 



EDPT{p)-C-mf inf ^ .^^=^"r 
^>i"-'"^EtiCp(«.) 



(67) 



The very same expressions are obtained for the continuous-time AWGN channel with unlimited 
bandwidth, where C = P/Nq, P being the signal power and Nq being the one-sided noise spectral 
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density. For p = 1 and A; = 2, we have Ci(ck) = min{a, 1 — a}: 

EdptW < C- mf — — — — — 

o<a<i 2min{a,l — a) 

= c. ,„f M^-c)^c 

o<a<i/2 2a 2 ' ^ ^ 

which agrees with E{p). For p = 2 and k = 2, the minimum is attained for a = 1/3, and the result 
is £'dpt(2) < 8C/9. However for /c = 3, the bound improves to C/3. 

6 Extension to the Multidimensional Case 



Consider now the case of a parameter vector U = {Ui, . . . , 17^), uniformly distributed across the unit 
hypercube [—1, 2, +1/2]'^. A reasonable figure of merit in this case would be a linear combination 
of E{\Ui — Ui\^}, i = 1,2, . . . ,d. Since each one of these terms is exponential in n, it makes sense 
to let the coefficients of this linear combination also be exponential functions of n, as otherwise, 
the results will be exponentially insensitive to the choice of the coefficients. This means that we 
consider the criterion 

d 

Y^e^'-^-EilUi-Uin, (70) 
1=1 

where, without loss of generality, we take rj > 0, miuj rj = 0. 

The derivation below is an extension of the derivation of the channel coding bound, given in 
Appendix A for the case d = 1. Therefore, a reader who is interested in the details is advised to 
read Appendix A first, or otherwise to skip directly to the final result in eq. ([8T]) and the discussion 
that follows. 

Let us define Ri = [ri + "y)/p for some constant 7 > 0. Consider the following chain of 
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inequalities: 



^e"*^' -EilUi-UilP} > ^e"''^ • e-''P^'Fr{\Ui - Ui\ > e""-^^} 
1=1 i=l 

d 

= ^e-"(^^'-''')Pr{|C/i -Ui\> e""-^'} 

i=l 

d 

= e-^''^FT{\Ui -Ui\> e-'^(''»+T)/''} 



a 

U{ 

i=l 



Ui-Ui\> e-"(^'>+^)/^| 



> e~^" • exp <^ -nEs 



1=1 



(71) 
(72) 
(73) 
(74) 
(75) 



where the second Une fohows from Chebychev's inequahty, the fifth Une follows from the union 
bound, and the last line follows from the same arguments as in [7?, Sect. IV. A]. Maximizing over 7, 
we get 



y e"''' -EilUi- Ui\P} > exp <^ -nmin 



1 + 



i=l 



(76) 



Defining R = {J2i=i + 7d)/p, Rmm = Yli=i '''i/P ^^'^ ''' — ^mm/d, the above minimization at the 
exponent becomes equivalent to 



mm 
min 



d 



^■R + E,i{R) 



pr 



Esp{Rp/d) + 2i^p/d - Rrnin) p/d<po 
Eex{0) - PoRmin p/d>po 



(77) 
(78) 

(79) 



where Re is defined as the achiever min ji>^^_^.^_^[0R + Esp{R)]. Thus, the extension of the channel- 
coding bound to the d-dimensional case reads 



E{p, d, ri, ...,rd) 



Esp{Rp/d) + 2^p/d - 2 Ef=i n P< Pod 
EeM-fY.Un P>Pod 

Eoi^)-^,EUri p/d<po 
EeM-^Y.tin p/d>po 



(80) 
(81) 



We see that when = for all i (i.e., all weights are 1), it is the same channel-coding bound as 
before, except that p is replaced by p/d, that is, E(p/d). For p — )• 00, the bound tends to E^xiO), 
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which can be approached again by a low-rate code for a Cartesian grid in the parameter space. 

At the other extreme, when d is very large compared to p, so p/d is small, construct a grid of 

^n{C-€)/d ^n{C-e)/d x . . . x e"^'-^"^^/'^, quantize U and assign to each grid point a codeword of a 

typical random code at rate C — e. Then the performance will be about q-^p'^I'^ _ Therefore, as a 

corollary of the above result, we have 

d _ 
Y^E{\\Ji-V,Y}'> e-"[-E(pM+o(n)]. (82) 
1=1 

Appendix A 

Proof of Theorem 1. We begin by using the Markov/Chebychev inequality: 

E\(J -U\P >/\P'Pi{\U -U\>/\}. (A.l) 

Next we need to further lower bound Pr{|C/ — U\ > A} and then maximize the r.h.s. over A. 
Equivalently, similarly as in [7], we may set A = e~"^ in the r.h.s. and maximize the bound w.r.t. 
R. Let E{R) be the reliability function of the channel. Then, similarljo as in [7, Theorem 1], we 
have: 

Vi{\U -U\> e""^} > e-"[-^(^)+°(")l (A.2) 

and so, 

E\tj - U\'' > e~'^P^ ■ e~"[-^'^-^)+°'^")l = g-"[p-R+-E^(-R)+oW] _ (a. 3) 

The beslQ lower bound is obtained by maximizing the r.h.s. over R, yielding 

E\U — U\^ > e^'^'^'^"^R>olpR+^i^)+°i^)] 

y ^-nminii>o[pR+Esi{R)+o{n)] (A. 4) 



^While ref. [7] is primarily about the continuous time additive white Gaussian noise (AWGN) channel, the argu- 
ments in the proof of Theorem 1 therein are insensitive to this assumption. They hold verbatim here, provided that 
the observation time T in is replaced by the block length n and the reliability function of the AWGN channel is 
replaced by that of the DMC considered here. 

'^The reader might suspect that the use of Chebychev's inequality yields a loose bound. Note, however, that even 
the exact relation E\U - Ul" = pn /J" dii • e"""-^ • Pr{| (7 - I7| > e'"^}, with Pt{\U - U\ > 6""^} > e-"[^(^)+°(")l, 
would yield, after saddle-point integration, exactly the same exponential order as presented above. The weak link 
here is, therefore, not the Chebychev inequality but the fact that there is no apparent single estimator, independent 
of R, that minimizes Pt{\U -U\> e-"^} uniformly for all R. 
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where Esi{R) is the exponent associated with the straight line bound, which is well known to be an 
upper bound on the reliability function E{R) [9], [10], [131 Sect. 3.8], and which is given by 



Esi{R) 

where 



f Ee^{0)-poR 0<R<Ro 

Esp{R) Ro< R<C (A.5) 

R>C 



Esp{R) = sup[Eo{q) - gR] (A.6) 

is the sphere-packing exponent, po is as defined in Theorem 1 and Rq is the rate R at which 
dEsp{R)/dR = —po, or equivalently, the solution to the equation Esp{R) = Eex{0) — PoR- Thus, 
according to the second line of eq. ()A.4p . 

£{p) <mm[pR + EsiiR)]. (A.7) 

For p > Po, the minimum is obviously attained at i? = 0, and so, 

^(p) < p-0 + ^,;(0) =^e:.(0). (A.8) 

For p < pq, we use 

£{p) < mm[pR + Esi{R)] < mm[pR + Esp{R)]. (A.9) 

i?>0 R>0 F\ 

The right -most side of eq. ()A.9P is the Legendre-Fenchel transform (LFT) of Esp{R), which in turn 
(according to ()A.6P ). is the LFT of Eq{p). Thus, the right -most side of (IA.9P is given by the UCE 
of Eo{p), which is Eo{p). Thus, 

This completes the proof of Theorem 1. 

Appendix B 

Proof of Theorem 2. Define 



E{P) 



suPo<^,< 1^0 (£•)/( + P<P+ 



R{p) = =^ = < Eo{l)/{l+p)=Ex{l)/{l+p) p+<p<p_ (B.l) 
^ I sup^>iEx{q)/{q + p) P>P~ 

Consider a grid of M = e^^^^^ /2 evenly spaced points along U, denoted {ui,U2, ■ ■ ■ ,um}, where 
ui = -1/2 + e-^-^^P^ and um = 1/2 - e'^^^'') (see also [Zl Theorem 2]). If p > p_, assign to 
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each point Ui a codeword of a code of rate R{p) that achieves the expurgated exponent Eex[R{p)]- 
Otherwise, do the same with a code that achieves Er[R{p)] (see [U p. 139, Corollary 1] or 113\ 
Theorem 3.2.1]). Given u, let /^(u) be the codeword Xi that is assigned to the grid point Uj, which 
is closest to u. Given y, let gn{y) be the grid point Uj that corresponds to the codeword xj that 
has been decoded based on y using the ML decoder for the given DMC. For every i? > 0, we have: 

\U-U\< e-"^| • Pr{|;7 -U\< e""-^} + 

\U-U\> e-"^l • Pr{|J7 - [/| > e""-^} 



£;{|;7-;7|^} = e^\u-u\p 
e\\u-u\p 



< [e-"^]''-l + l''-Pr{|;7-C/| >e-"^} 

= e-^'P^ + PrilU -U\>e-"^}. (B.2) 

Now, it follows from the construction of the proposed scheme that if R is the coding rate and the 
spacing between each two consecutive grid points is 2e~"'^, then the event {|[/ — C/| > e~"^} occurs 
iff the ML decoder errs. Thus, Pr{|(7 — U\ > e""^} is exactly the probability of decoding error. 
Considering the case p > p-, this code is assumed to achieve the expurgated exponent, and so, this 
probability of error is upper bounded by e~"'^^'^^(^)l~°(")K Since pR is an increasing function of R 
and E(,x{R) is a decreasing function, the best choice of R is the solution to the equation 

pR = Eex{R) (B.3) 

or, equivalent ly 

pR = sup[E,{q) - qR]. (B.4) 
Below we show that the solution to this equation is given by 

R = R{p) ^ sup (B.5) 

e>i Q + P 

and for this choice of ii, both exponents in the last line of (|B.2p are given by 

pfi(p) = sup^^ (B.6) 

Q>1 Q+P 

which is exactly the expression of E(p) in the range p > p-. In the range p < p^, exactly the 
same arguments hold, except that Eex{R) and Ex{q) and supg>x are replaced by Er{R), EQ{g), 
and supo<£,<i, respectively. In the intermediate range, the same line of arguments hold once again, 
with ^ = 1 and Ex{l) = Eo{l). 
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It remains to show that R{p) in (jB.SP solves equation ()B.4p for p > p^, and then similar 
arguments will follow for the two other ranges. Let R{p) be defined as in ()B.5P and let R'{p) be 
defined as the solution to (lB.4p . We wish to prove that R{p) = R'{p)- To this end, we will prove 
that both R{p) > R'ip) and R{p) < R'ip)- To prove the first inequality, let g{R) denote the 
achiever of Eex{R) = s'^Pg>i[Ex{Q) — qR]- Then, by definition of R'{p), we obviously have 

pR\p) = EMR'ip))] - Q[R'ip)]R'{p) (B.7) 



I.e., 



^ ^P> = rp// M , ^ sup —— = R{p). B.8 

e[R\p)\ + p e>i e + p 

To prove the second (opposite) inequality, let q{p) be the achiever of R{p)., that is, 

QKP) + P 

or, equivalent ly, 

PR{P) = EMP)\-Q{P)R{P)- (B.IO) 
But the l.h.s. cannot exceed svci>q^i[Ex{q) — qR{p)\ = Eex[R{p)\-, and so, 

pR{p) < Eex[R{p)]. (B.ll) 

Now, as mentioned earlier, the function pR is increasing in R whereas the function E(.x{R) is 
decreasing. Thus, the value of R for which there is equality pR = E^xiR), which is R'{p), cannot 
be smaller than any value of R, for which pR < E^xiR), like R{p)- Hence, R{p) < R'ip)- This 
completex the proof of Theorem 2. 

Appendix C 

Derivation of a lower bound on the generalized rate-distortion function. Consider the minimization 
of the generalized mutual information 

{k \ k 

I du\{p{u\UiT^ r = " / "^""II / '^'^^V{Ui)p{u\Uir\ (C.l) 
Ju J Ju Ju 

Similarly as in [TBt Sect. IV, Example 2] and [6], since we are dealing with an exponentially small 
estimation error level (small distortion), then for reasons of convenience, we approximate our dis- 
tortion measure d{u,u) = \u — u\p (n, u hy 

d'{u,u) = \{u-u) mod 11^. (C.2) 
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where 



t mod 1 



t + 



1 



(C.3) 



(r) being the fractional part of r, that is, (r) = r — [rj . The justification is that for very small 
distortion (the high-resolution limit), the modulo 1 operation has a negligible effect, and hence 
d'{u,u) becomes essentially equivalent to the original distortion measure d{u,u) = \u — Using 
the same reasoning as in |18l Sect. IV, Example 2] and [B], there is no loss of optimality by confining 
attention to channels p{u\u) of the form f{w) with w = u — u mod 1. Thus, the minimization of 
I{U;U) reduces to the maximization of 

k 0+1/2 



dwiifiwi)] 

■1/2 



(C.4) 



subject to the constraints 



+1/2 



-1/2 



dw ■ f{w) 



-1/2 



-1/2 



dw • \w\^f{w) 



D. 



(C.5) 
(C.6) 



This optimization problem is not trivial, but we can find an upper bound on U{f) in terms of D 
for small D. We begin with the following bound for each one of the factors of U{f): 



+1/2 
1/2 



dw ■ [f{w)Y 



-1/2 

-1/2 
1/2 



dw ■ [f{w)Y 



\w\P + D 
\w\P + D 



< 



-1/2 
+1/2 



dw[f{w){\w\P + D)]'^^ 



1 



1/2 

(2L>)"' 



dw ■ f{w){\w\P + D) 
1/2 dw 



{\w\P^Df 

+1/2 



\-a.i 



dw 



1/2 (1^" + ^)' 



1-a, 



l-Qi 



(C.7) 
(C.8) 

(C.9) 
(C.IO) 



_l/2 {\w\P + Df 

where = ai/(l — Oj) and the third line follows from Holder's inequality. It remains to evaluate 



the integral 



dw 



1/2 

_l/2 \\w\P + Df^ 



(C.ll) 



To this end, we have to distinguish between the cases Qi > 1/p and 9i < 1/p (the case 6i = 1/p can 
be solved separately or approached as a limit of 6i ^ 1/ p from either side). For the case 6i > 1/p, 
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letting 

Ci = 

we can easily bound / as follows: 

/ = D-^' 

< CiD^^P- 
For 6i < 1/ p, we proceed as follows: 

/ = ^i/p-e. 

< ib^Ip-^ 

= 2D^Ip-^ 
= 2D^Ip-^ 
= 2D^Ip-^ 



+1/2 



1/2 {\w/D^IP\P + l)<^^ 



+ 00 



d{w/D^/p) 
{\w/D^Ip\p + lf^ 



+ l/(2Z)i/p) 



1/(2DVp) (1^1" + 1)^ 
+l/(2Di/p) 



{t" + 1)'' 

1/(2DVp) 



(max{tP, l]f^ 

1/(2DVp) 



max{t/'^sl} 

1 ^+1/(2DVp) ^ 

l/(2Di/p)- 



1 + 



t 



1 



1 + 



^pSi-^j^Si-xj p _ Y 

1 - 



< 



1 



Thus, defining = 2°' max{Q, 2^^' /(I - pQi)], we have 

-1/2 
'-1/2 



/ dw ■ [/H]"^ < (2L>)"'/ 

J-1/2 



(C.12) 



where the function (^p(-) is defined as in (jlO|) . Thus, 



(C.26) 



21 



where c = ni=i Finally, it follows that 

R{D) > -c ■ D^i=i (C.27) 

as claimed. 
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